{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "771a2164-b860-415c-9a88-14126702e163",
   "metadata": {},
   "source": [
    "*Copyright - L. Siddharth, Singapore University of Technology and Design* (siddharthl.iitrpr.sutd@gmail.com)\n",
    "\n",
    "__Short guide to using extracting design knowledge from patent text__\n",
    "based on the following research:\n",
    "Siddharth, L., Luo, J., 2024. Retrieval-Augmented Generation using Engineering Design Knowledge. (cs.CL) https://arxiv.org/abs/2307.06985"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "6cb8e801-70f8-431e-9a34-edfef5bb176b",
   "metadata": {},
   "source": [
    "__Package Installation__\n",
    "Please install the following packages in the desired Python environment."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "id": "ef0a4b9d-98b3-4cf3-a523-429685a90721",
   "metadata": {},
   "outputs": [],
   "source": [
    "from IPython.display import clear_output\n",
    "\n",
    "!pip install spacy[transformers]\n",
    "!pip install spacy torch patoolib bs4\n",
    "!python -m spacy download en_core_web_sm\n",
    "!python -m spacy download en_core_web_trf\n",
    "\n",
    "clear_output()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "2ed0af26-38ba-4a78-870b-5f5aae36488f",
   "metadata": {},
   "source": [
    "__Module Import__\n",
    "The package modules shall be imported as follows. The underlying trained transformer models will be downloaded during first time import.\n",
    "Please ensure that the package folder \"design_kgex\" is placed in the sample directory as the current working directory.\n",
    "The package can be downloaded from GitHub - *https://github.com/siddharthl93/engineering-design-knowledge*"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "id": "946966b1-6396-47ab-b6a7-0c4b8f3518c5",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Downloading entity_relation_tagger.rar...\n"
     ]
    },
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "eafe36f36149417395da62abece232af",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "  0%|          | 0/434907286 [00:00<?, ?it/s]"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "INFO patool: Extracting entity_relation_tagger.rar ...\n",
      "INFO patool: running \"C:\\Program Files\\WinRAR\\rar.EXE\" x -- C:\\Users\\jonny\\Desktop\\edk\\entity_relation_tagger.rar\n",
      "INFO patool:     with cwd=design_kgex, input=\n",
      "INFO patool: ... entity_relation_tagger.rar extracted to `design_kgex'.\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Downloading relation_identifier.rar...\n"
     ]
    },
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "0a9be986b66e45f8afc03adcd86ea05f",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "  0%|          | 0/435547086 [00:00<?, ?it/s]"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "INFO patool: Extracting relation_identifier.rar ...\n",
      "INFO patool: running \"C:\\Program Files\\WinRAR\\rar.EXE\" x -- C:\\Users\\jonny\\Desktop\\edk\\relation_identifier.rar\n",
      "INFO patool:     with cwd=design_kgex, input=\n",
      "INFO patool: ... relation_identifier.rar extracted to `design_kgex'.\n"
     ]
    }
   ],
   "source": [
    "from design_kgex import patent_text, design_knowledge"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "aeb43034-d64c-47e3-bfa3-deca970f32e5",
   "metadata": {},
   "source": [
    "__Getting patent sentences__\n",
    "First, let us use the module \"patent_text\" to get the list of formatted sentences from a patent. For this purpose, we will use an example patent as follows.\n",
    "Card for textile fibers with carding cylinders cooperating in series - *https://patents.google.com/patent/US5974628A/*\n",
    "\n",
    "From the above patent, we will input the patent number as follows."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "id": "dfef3e5d-b3df-4be5-8f6b-865963bde65f",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "The patent 7520356 includes 209 sentences. Some sample sentences are as follows.\n",
      "\n",
      "\n",
      " - a hinge element fixed to said support frame, said hinge element adapted to couple with a hinge element of a second suction module, whereby said suction modules can be angularly oriented with respect to each other.\n",
      "\n",
      " - The impeller has an axis of rotation and is adapted to draw air from the vacuum chamber into the impeller in a direction generally parallel to the impeller axis of rotation.\n",
      "\n",
      " - Referring first to FIGS 1 and 2, the wall climbing robot 10 of the present invention generally includes at least two suction modules 11 and 12 pivotably connected together by a hinge assembly consisting of a bracket 13 and hinge 14 arrangement.\n",
      "\n",
      " - The flexible joint 52 also allows the robot to maneuver over uneven surfaces with obstacles.\n",
      "\n",
      " - The drive wheels 18 rotate about a single common axis, while the castor wheel 20 is permitted to rotate about multiple axes.\n",
      "\n"
     ]
    }
   ],
   "source": [
    "import random\n",
    "\n",
    "patent_number = \"7520356\"\n",
    "sentences = patent_text.getPatentText(patent_number)\n",
    "\n",
    "print(f\"The patent {patent_number} includes {len(sentences)} sentences. Some sample sentences are as follows.\\n\\n\")\n",
    "random.shuffle(sentences)\n",
    "\n",
    "for sent in sentences[:5]:\n",
    "    print(\" - \" + sent + \"\\n\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "654e8948-0322-4f3e-9dde-067309a7278f",
   "metadata": {},
   "source": [
    "__Extracting design knowledge__\n",
    "Next, let us utilise the \"design_knowledge\" module to extract engineering design knowledge from the sentences thus obtained. The following code will return knowledge as a list, wherein, each item is a dictionary pertaining to the following format. \n",
    "{\n",
    "    __\"sentence\"__: \"...\", \n",
    "    __\"entities\"__: \n",
    "    [\"entity #1\", \n",
    "     \"entity #1\"...\n",
    "    ], \n",
    "    __\"facts\"__: \n",
    "    [\n",
    "        [\"head entity\", \"relationship\", \"tail entity\"], \n",
    "        [\"head entity\", \"relationship\", \"tail entity\"]...\n",
    "    ]\n",
    "}\n",
    "\n",
    "__Note__: It is preferable that the following code is executed in a GPU environment if several sentences are input."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "id": "da1a61ab-7f97-4ff9-9b91-cac3090751f5",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Sorry, no GPU is available! Processing will be performed in normal time.\n"
     ]
    },
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "83d439a035e444c684a760ddbfbc7a1d",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "  0%|          | 0/3 [00:00<?, ?it/s]"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "sentence -\n",
      "\n",
      "a hinge element fixed to said support frame, said hinge element adapted to couple with a hinge element of a second suction module, whereby said suction modules can be angularly oriented with respect to each other. \n",
      "\n",
      "entities -\n",
      "\n",
      "['said suction modules', 'a second suction module', 'a hinge element', 'said hinge element', 'respect', 'said support frame'] \n",
      "\n",
      "facts -\n",
      "\n",
      "[['a hinge element', 'fixed to', 'said support frame'], ['a hinge element', 'fixed', 'said hinge element'], ['said hinge element', 'adapted to couple with', 'a hinge element'], ['a hinge element', 'of', 'a second suction module'], ['a second suction module', 'whereby', 'said suction modules'], ['said suction modules', 'angularly oriented with', 'respect']] \n",
      "\n",
      "\n",
      "---------------------\n",
      "\n",
      "sentence -\n",
      "\n",
      "The impeller has an axis of rotation and is adapted to draw air from the vacuum chamber into the impeller in a direction generally parallel to the impeller axis of rotation. \n",
      "\n",
      "entities -\n",
      "\n",
      "['rotation', 'the vacuum chamber', 'the impeller', 'an axis', 'air', 'the impeller axis', 'a direction'] \n",
      "\n",
      "facts -\n",
      "\n",
      "[['the impeller', 'has', 'an axis'], ['the impeller', 'is adapted to draw', 'air'], ['an axis', 'of', 'rotation'], ['air', 'from', 'the vacuum chamber'], ['air', 'into', 'the impeller'], ['air', 'in', 'a direction'], ['the vacuum chamber', 'into', 'the impeller'], ['the impeller', 'in', 'a direction'], ['a direction', 'generally parallel to', 'the impeller axis'], ['the impeller axis', 'of', 'rotation']] \n",
      "\n",
      "\n",
      "---------------------\n",
      "\n",
      "sentence -\n",
      "\n",
      "Referring first to FIGS 1 and 2, the wall climbing robot 10 of the present invention generally includes at least two suction modules 11 and 12 pivotably connected together by a hinge assembly consisting of a bracket 13 and hinge 14 arrangement. \n",
      "\n",
      "entities -\n",
      "\n",
      "['the wall climbing robot', 'at least two suction modules', 'a hinge assembly', 'a bracket 13 and hinge 14 arrangement', 'the present invention'] \n",
      "\n",
      "facts -\n",
      "\n",
      "[['the wall climbing robot', 'generally includes', 'at least two suction modules'], ['the wall climbing robot', 'connected by', 'a hinge assembly'], ['at least two suction modules', 'pivotably connected together by', 'a hinge assembly'], ['a hinge assembly', 'consisting of', 'a bracket 13 and hinge 14 arrangement']] \n",
      "\n",
      "\n",
      "---------------------\n",
      "\n"
     ]
    }
   ],
   "source": [
    "example_sentences = sentences[:3]\n",
    "extracted_knowledge = design_knowledge.extractDesignKnowledge(example_sentences)\n",
    "\n",
    "for item in extracted_knowledge:\n",
    "    for key in item:\n",
    "        print(key, \"-\\n\")\n",
    "        print(item[key], \"\\n\")\n",
    "    print(\"\\n---------------------\\n\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "453a17ad-8b42-449c-b082-bdab32d51960",
   "metadata": {},
   "source": [
    "In the above output,\n",
    "\n",
    "Entities are subsets of noun-phrases in the sentence. The appropriate ones that communicate design knowledge are identified by the models that we have trained included in the package.\n",
    "\n",
    "Facts associate a pair of the above list entities using a relationship that is communicated in the sentence. The fact is given in the form of a triple \"head entity, relationship, tail entity\".\n",
    "The above exracted facts constitute a graph that represents design knowledge extracted from a list of sentences. To visualise the graph, various libraries like networkx or vis.js.\n",
    "\n",
    "For any queries, please write to *siddharthl.iitrpr.sutd@gmail.com*."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "e5522d5d-e44c-4b0a-969e-3d4ff096a47f",
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python [conda env:base] *",
   "language": "python",
   "name": "conda-base-py"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.11.7"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}
